Combining Models from Multiple Sources for RGB-D Scene Recognition
نویسندگان
چکیده
Depth can complement RGB with useful cues about object volumes and scene layout. However, RGB-D image datasets are still too small for directly training deep convolutional neural networks (CNNs), in contrast to the massive monomodal RGB datasets. Previous works in RGB-D recognition typically combine two separate networks for RGB and depth data, pretrained with a large RGB dataset and then fine tuned to the respective target RGB and depth datasets. These approaches have several limitations: 1) only use low-level filters learned from RGB data, thus not being able to exploit properly depth-specific patterns, and 2) RGB and depth features are only combined at highlevels but rarely at lower-levels. In this paper, we propose a framework that leverages both knowledge acquired from large RGB datasets together with depth-specific cues learned from the limited depth data, obtaining more effective multi-source and multi-modal representations. We propose a multi-modal combination method that selects discriminative combinations of layers from the different source models and target modalities, capturing both high-level properties of the task and intrinsic low-level properties of both modalities.
منابع مشابه
Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs
Scene recognition with RGB images has been extensively studied and has reached very remarkable recognition levels, thanks to convolutional neural networks (CNN) and large scene datasets. In contrast, current RGB-D scene data is much more limited, so often leverages RGB large datasets, by transferring pretrained RGB CNN models and fine-tuning with the target RGB-D dataset. However, we show that ...
متن کاملIntegrating 3D structure into traffic scene understanding with RGB-D data
RGB Video now is one of the major data sources of traffic surveillance applications. In order to detect the possible traffic events in the video, traffic-related objects, such as vehicles and pedestrians, should be first detected and recognized. However, due to the 2D nature of the RGB videos, there are technical difficulties in efficiently detecting and recognizing traffic-related objects from...
متن کاملEnhanced RGB-D Mapping Method for Detailed 3D Indoor and Outdoor Modeling
RGB-D sensors (sensors with RGB camera and Depth camera) are novel sensing systems that capture RGB images along with pixel-wise depth information. Although they are widely used in various applications, RGB-D sensors have significant drawbacks including limited measurement ranges (e.g., within 3 m) and errors in depth measurement increase with distance from the sensor with respect to 3D dense m...
متن کاملHand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملContextual object category recognition for RGB-D scene labeling
Recent advances in computer vision on the one hand, and imaging technologies on the other hand, have opened up a number of interesting possibilities for robust 3D scene labeling. This paper presents contributions in several directions to improve the state-of-the-art in RGB-D scene labeling. First, we present a novel combination of depth and color features to recognize different object categorie...
متن کامل